Feat/hunspell cache api#20841
Conversation
PR Code Analyzer ❗AI-powered 'Code-Diff-Analyzer' found issues on commit 0297edf. 'Diff too large, requires skip by maintainers after manual review' Pull Requests Author(s): Please update your Pull Request according to the report above. Repository Maintainer(s): You can Thanks. |
0297edf to
05a74c4
Compare
PR Code Analyzer ❗AI-powered 'Code-Diff-Analyzer' found issues on commit 05a74c4. 'Diff too large, requires skip by maintainers after manual review' Pull Requests Author(s): Please update your Pull Request according to the report above. Repository Maintainer(s): You can Thanks. |
There was a problem hiding this comment.
Pull request overview
This PR introduces Hunspell dictionary cache management APIs (info + invalidation) and wires them through REST → transport actions → HunspellService, alongside related Hunspell package-based loading (ref_path) and cache-utility additions needed to support these endpoints.
Changes:
- Add transport actions + REST handler for Hunspell cache info (
GET /_hunspell/cache) and invalidation (POST /_hunspell/cache/_invalidate,/_invalidate_all). - Extend
HunspellService/HunspellTokenFilterFactorywith package-based dictionary loading and cache invalidation utilities. - Update node/module wiring and add unit + integration tests and test dictionary resources.
Reviewed changes
Copilot reviewed 23 out of 24 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| server/src/main/java/org/opensearch/rest/action/admin/indices/RestHunspellCacheInvalidateAction.java | New REST handler for cache info + invalidation endpoints. |
| server/src/main/java/org/opensearch/action/admin/indices/cache/hunspell/HunspellCacheInfoAction.java | New action type for cache info. |
| server/src/main/java/org/opensearch/action/admin/indices/cache/hunspell/HunspellCacheInfoRequest.java | New transport request for cache info. |
| server/src/main/java/org/opensearch/action/admin/indices/cache/hunspell/HunspellCacheInfoResponse.java | New transport/XContent response for cache info. |
| server/src/main/java/org/opensearch/action/admin/indices/cache/hunspell/TransportHunspellCacheInfoAction.java | Transport handler to read Hunspell cache state. |
| server/src/main/java/org/opensearch/action/admin/indices/cache/hunspell/HunspellCacheInvalidateAction.java | New action type for cache invalidation. |
| server/src/main/java/org/opensearch/action/admin/indices/cache/hunspell/HunspellCacheInvalidateRequest.java | New invalidation request + validation logic. |
| server/src/main/java/org/opensearch/action/admin/indices/cache/hunspell/HunspellCacheInvalidateResponse.java | New invalidation response with consistent schema fields. |
| server/src/main/java/org/opensearch/action/admin/indices/cache/hunspell/TransportHunspellCacheInvalidateAction.java | Transport handler performing invalidations via HunspellService. |
| server/src/main/java/org/opensearch/action/admin/indices/cache/hunspell/package-info.java | New package docs for hunspell cache actions. |
| server/src/main/java/org/opensearch/action/ActionModule.java | Registers new actions + REST handler; adds HunspellService binding. |
| server/src/main/java/org/opensearch/node/Node.java | Passes HunspellService into ActionModule. |
| server/src/main/java/org/opensearch/indices/analysis/AnalysisModule.java | Exposes getHunspellService() publicly for node wiring. |
| server/src/main/java/org/opensearch/indices/analysis/HunspellService.java | Adds package dictionary loading + cache key utilities + invalidation APIs. |
| server/src/main/java/org/opensearch/index/analysis/HunspellTokenFilterFactory.java | Adds ref_path loading + validation + updateable analysis mode behavior. |
| server/src/internalClusterTest/java/org/opensearch/action/admin/indices/cache/hunspell/HunspellCacheIT.java | Adds cluster-level tests for transport request/response + schema/validation. |
| server/src/test/java/org/opensearch/rest/action/admin/indices/RestHunspellCacheInvalidateActionTests.java | Adds REST handler/unit tests for routes, request/response serialization, params. |
| server/src/test/java/org/opensearch/indices/analyze/HunspellServiceTests.java | Adds unit tests for package dictionary loading + caching + invalidation. |
| server/src/test/java/org/opensearch/index/analysis/HunspellTokenFilterFactoryTests.java | Adds tests for ref_path behavior, validation, and updateable mode. |
| server/src/test/java/org/opensearch/action/ActionModuleTests.java | Updates constructor calls for new ActionModule signature. |
| server/src/test/java/org/opensearch/extensions/rest/RestSendToExtensionActionTests.java | Updates constructor calls for new ActionModule signature. |
| server/src/test/resources/indices/analyze/conf_dir/packages/test-pkg/hunspell/en_US/en_US.aff | Adds test hunspell affix file for package-based dictionary loading. |
| CHANGELOG.md | Adds changelog entries for ref_path + cache API (currently with placeholder PR). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
server/src/main/java/org/opensearch/indices/analysis/HunspellService.java
Show resolved
Hide resolved
server/src/main/java/org/opensearch/indices/analysis/HunspellService.java
Show resolved
Hide resolved
server/src/main/java/org/opensearch/indices/analysis/HunspellService.java
Outdated
Show resolved
Hide resolved
| /** | ||
| * Sets up hunspell dictionary files in the node's config directory. | ||
| * Creates both traditional and package-based dictionaries. | ||
| */ | ||
| private void setupDictionaries(Path configDir) throws IOException { | ||
| // Traditional dictionary: config/hunspell/en_US/ | ||
| Path traditionalDir = configDir.resolve("hunspell").resolve("en_US"); | ||
| Files.createDirectories(traditionalDir); | ||
| Files.write(traditionalDir.resolve("en_US.aff"), "SET UTF-8\n".getBytes(StandardCharsets.UTF_8)); | ||
| Files.write(traditionalDir.resolve("en_US.dic"), "1\nhello\n".getBytes(StandardCharsets.UTF_8)); | ||
|
|
||
| // Package-based dictionary: config/packages/test-pkg/hunspell/en_US/ | ||
| Path packageDir = configDir.resolve("packages").resolve("test-pkg").resolve("hunspell").resolve("en_US"); | ||
| Files.createDirectories(packageDir); | ||
| Files.write(packageDir.resolve("en_US.aff"), "SET UTF-8\n".getBytes(StandardCharsets.UTF_8)); | ||
| Files.write(packageDir.resolve("en_US.dic"), "1\nworld\n".getBytes(StandardCharsets.UTF_8)); | ||
| } | ||
|
|
There was a problem hiding this comment.
setupDictionaries(...) is currently unused, and the integration tests never load a dictionary into the cache. As written, these tests will still pass even if cache-info/invalidation behavior for non-empty caches is broken. Please either remove the unused helper or use it to set up dictionaries and add assertions for non-empty cache counts/keys before and after invalidation.
| /** | |
| * Sets up hunspell dictionary files in the node's config directory. | |
| * Creates both traditional and package-based dictionaries. | |
| */ | |
| private void setupDictionaries(Path configDir) throws IOException { | |
| // Traditional dictionary: config/hunspell/en_US/ | |
| Path traditionalDir = configDir.resolve("hunspell").resolve("en_US"); | |
| Files.createDirectories(traditionalDir); | |
| Files.write(traditionalDir.resolve("en_US.aff"), "SET UTF-8\n".getBytes(StandardCharsets.UTF_8)); | |
| Files.write(traditionalDir.resolve("en_US.dic"), "1\nhello\n".getBytes(StandardCharsets.UTF_8)); | |
| // Package-based dictionary: config/packages/test-pkg/hunspell/en_US/ | |
| Path packageDir = configDir.resolve("packages").resolve("test-pkg").resolve("hunspell").resolve("en_US"); | |
| Files.createDirectories(packageDir); | |
| Files.write(packageDir.resolve("en_US.aff"), "SET UTF-8\n".getBytes(StandardCharsets.UTF_8)); | |
| Files.write(packageDir.resolve("en_US.dic"), "1\nworld\n".getBytes(StandardCharsets.UTF_8)); | |
| } |
PR Reviewer Guide 🔍(Review updated until commit 99f2309)Here are some key observations to aid the review process:
|
PR Code Suggestions ✨Latest suggestions up to 99f2309 Explore these optional code suggestions:
Previous suggestionsSuggestions up to commit 13ba03f
Suggestions up to commit da51569
Suggestions up to commit 04f3e4b
Suggestions up to commit 4f3ab0f
Suggestions up to commit b7ec9d0
|
I've fixed the doc to reflect the JDK 21 minimum requirement. I've also removed a few references to moving targets (i.e. which JDK is bundled) since that will frequently change and this doc will be wrong again. Signed-off-by: Andrew Ross <andrross@amazon.com>
Race condition between request completion and task resource tracking cleanup. The sequence of events: 1. Task is cancelled via `CancelTasksRequest` 2. The node operation throws `TaskCancelledException` 3. The response is sent back to the caller, which counts down `requestCompleteLatch` 4. The test's main thread wakes up from `requestCompleteLatch.await()` and asserts `resourceTasks.size() == 0` 5. Meanwhile, `TaskResourceTrackingService.stopTracking()` (which calls `resourceAwareTasks.remove()`) is invoked asynchronously via a `resourceTrackingCompletionListener` registered in `TaskManager.register()` Steps 4 and 5 race. I was able to reproduce the failure locally using `stess-ng` and verify this fix. Signed-off-by: Andrew Ross <andrross@amazon.com>
RemoteShardsBalancer.balance() only rebalances by primary shard count, not total shard count. In tight-capacity scenarios where cluster-wide shard limits leave minimal spare slots, this prevents the balancer from redistributing replicas to free space on other nodes, leaving assignable shards unassigned. I believe this is the cause of the flakiness in ShardsLimitAllocationDeciderIT. This change introduces a unit test that deliberately targets the tightly packed scenario. The RemoteShardsBalancer variant of the new test will reliably fail if run repeated, though there is still some non-determinism. Regardless, it is muted along with the exiting integration test. If/when we improve the intelligence of RemoteShardsBalancer then we can unmute these tests. For now the tests exist to document this known limitation. Signed-off-by: Andrew Ross <andrross@amazon.com>
…ct#20673) Previously the code to capture and dedup error and warn messages from test nodes would remove the timestamp from the message. This made debugging timing issues difficult. For example, the message would look like: ``` » WARN ][o.o.d.FileBasedSeedHostsProvider] [v2.19.5-0] expected, but did not find, a dynamic hosts list at [/home/ubuntu/workplace/opensearch-project/OpenSearch/qa/rolling-upgrade/build/testclusters/v2.19.5-0/config/unicast_hosts.txt] » ↑ repeated 2 times ↑ ``` With this change it looks like: ``` » [2026-02-18T20:04:29,167][WARN ][o.o.d.FileBasedSeedHostsProvider] [v2.19.5-0] expected, but did not find, a dynamic hosts list at [/home/ubuntu/workplace/opensearch-project/OpenSearch/qa/rolling-upgrade/build/testclusters/v2.19.5-0/config/unicast_hosts.txt] » ↑ repeated 2 times ↑ ``` Signed-off-by: Andrew Ross <andrross@amazon.com>
* Upgrade to Gradle 9.4 Also replace the eager cross-project task reference to :libs:agent-sm:agent with Gradle-idiomatic patterns: - Add a consumable agentDist configuration in the agent project that publishes the prepareAgent output directory as an artifact with a Category attribute. - Add a matching resolvable agent configuration in the distribution subprojects to consume it via normal dependency resolution. - Replace direct task references (project.prepareAgent, project.jar) with lazy alternatives: tasks.named() for task providers, lazy GStrings for deferred path resolution, and closure-based dependsOn. This removes the need for an evaluationDependsOn call which forced the agent project to be configured before the distribution project, violating Gradle best practices around project isolation and configuration-time coupling. Signed-off-by: Andrew Ross <andrross@amazon.com> * Refactor the agent wiring to use configurations Signed-off-by: Andriy Redko <drreta@gmail.com> * Use .singleFile and lazy evaluation for agentJar Signed-off-by: Andrew Ross <andrross@amazon.com> --------- Signed-off-by: Andrew Ross <andrross@amazon.com> Signed-off-by: Andriy Redko <drreta@gmail.com> Co-authored-by: Andriy Redko <drreta@gmail.com>
…earch-project#20745) * Add Virtual Shards routing and mapping overrides Implements the initial routing foundation for Virtual Shards. Adds settings validation for virtual shards and routing overrides via IndexMetadata custom data. Signed-off-by: Atri Sharma <atri.jiit@gmail.com> * Add Virtual Shards routing and mapping overrides This PR adds the initial routing and metadata groundwork for virtual shards. When enabled, routing now uses a virtual shard id (vShardId) before resolving to a physical shard. This separates hash-space size from physical shard count and prepares the path for future shard movement workflows. What’s included - New static index setting: index.number_of_virtual_shards (default -1, disabled). - Routing update in OperationRouting.generateShardId: - virtual-shard path when enabled, - existing behavior unchanged when disabled. - New VirtualShardRoutingHelper for vShardId -> physical shard resolution. - Optional per-index override support via virtual_shards_routing custom metadata. - Test coverage for: - enabled/disabled routing behavior, - validation rules, - override and fallback behavior. Out of scope - Side-car segment extraction flow. - Transport/state-orchestration for managing override lifecycle. Signed-off-by: Atri Sharma <atri.jiit@gmail.com> * Add Virtual Shards routing and mapping overrides This PR adds the initial routing and metadata groundwork for virtual shards. When enabled, routing now uses a virtual shard id (vShardId) before resolving to a physical shard. This separates hash-space size from physical shard count and prepares the path for future shard movement workflows. What’s included - New static index setting: index.number_of_virtual_shards (default -1, disabled). - Routing update in OperationRouting.generateShardId: - virtual-shard path when enabled, - existing behavior unchanged when disabled. - New VirtualShardRoutingHelper for vShardId -> physical shard resolution. - Optional per-index override support via virtual_shards_routing custom metadata. - Test coverage for: - enabled/disabled routing behavior, - validation rules, - override and fallback behavior. Out of scope - Side-car segment extraction flow. - Transport/state-orchestration for managing override lifecycle. Signed-off-by: Atri Sharma <atri.jiit@gmail.com> * Refactor Virtual Shards to use range-based routing Signed-off-by: Atri Sharma <atri.jiit@gmail.com> * Add fail-safe to VirtualShardRoutingHelper and valid configuration tracking test Signed-off-by: Atri Sharma <atri.jiit@gmail.com> * Fixed review comments and move to range based sharding Signed-off-by: Atri Sharma <atri.jiit@gmail.com> --------- Signed-off-by: Atri Sharma <atri.jiit@gmail.com>
When a patch release (e.g., 2.19.5) is published and the release branch is bumped to the next patch (2.19.6), BWC tests on main fail because Version.java still references the old patch version. This causes all in-flight PRs to fail until Version.java is updated. This change relaxes the logic so that BWC tests will still pass if the checked out code uses a patch version one greater than expected. This prevents CI failures every time a release branch increments its patch version, but still prevents the main branch from drifting by more than one patch version. Signed-off-by: Andrew Ross <andrross@amazon.com>
a1a8a58 to
27502dd
Compare
|
Persistent review updated to latest commit 27502dd |
27502dd to
b7ec9d0
Compare
|
Persistent review updated to latest commit b7ec9d0 |
|
Persistent review updated to latest commit 4f3ab0f |
|
❌ Gradle check result for 4f3ab0f: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
|
Persistent review updated to latest commit 04f3e4b |
04f3e4b to
da51569
Compare
|
Persistent review updated to latest commit da51569 |
|
❌ Gradle check result for da51569: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
…ect#20697) * initial commit of extensible query engine plugin to sandbox Signed-off-by: Marc Handalian <marc.handalian@gmail.com> * clean up build.gradle - update forbidden-dependencies to skip guava check in sandbox plugins, calcite requires this dependency at compile time Signed-off-by: Marc Handalian <marc.handalian@gmail.com> * Rename plugin interfaces and default implementations. Wire up a ppl front-end using UnifiedQueryAPI from sql plugin. Signed-off-by: Marc Handalian <marc.handalian@gmail.com> * refactor to plugin-plugin SPI Signed-off-by: Marc Handalian <marc.handalian@gmail.com> * add readmes and start some clean up. Signed-off-by: Marc Handalian <marc.handalian@gmail.com> * analyzer errors Signed-off-by: Marc Handalian <marc.handalian@gmail.com> * move fe plugin into analytics plugin for testing only, we will use sql plugin. also remove "hub" plugin. Signed-off-by: Marc Handalian <marc.handalian@gmail.com> * spotless Signed-off-by: Marc Handalian <marc.handalian@gmail.com> * clean up Signed-off-by: Marc Handalian <marc.handalian@gmail.com> * more clean up Signed-off-by: Marc Handalian <marc.handalian@gmail.com> * fixing analyzer issues Signed-off-by: Marc Handalian <marc.handalian@gmail.com> * fix javadoc Signed-off-by: Marc Handalian <marc.handalian@gmail.com> * fix guava forbidden check Signed-off-by: Marc Handalian <marc.handalian@gmail.com> * fix license check Signed-off-by: Marc Handalian <marc.handalian@gmail.com> * fix javadoc warning on transitive dependency. Signed-off-by: Marc Handalian <marc.handalian@gmail.com> * clean up build.gradle and fix weird javadoc issues with dependencies. Signed-off-by: Marc Handalian <marc.handalian@gmail.com> * fix calcite/guava dependencies Signed-off-by: Marc Handalian <marc.handalian@gmail.com> * fix package name Signed-off-by: Marc Handalian <marc.handalian@gmail.com> * remove EngineCapabilities, just use calcite's sqloperatortable. wraps this and schema in an engineContext provided to front-ends Signed-off-by: Marc Handalian <marc.handalian@gmail.com> * simplify unified IT to use params Signed-off-by: Marc Handalian <marc.handalian@gmail.com> * fix guava NOTICE file to exactly match the file from grpc Signed-off-by: Marc Handalian <marc.handalian@gmail.com> * javadoc fix Signed-off-by: Marc Handalian <marc.handalian@gmail.com> * Update sandbox/plugins/analytics-engine/README.md Co-authored-by: Andrew Ross <andrross@amazon.com> Signed-off-by: Marc Handalian <handalm@amazon.com> --------- Signed-off-by: Marc Handalian <marc.handalian@gmail.com> Signed-off-by: Marc Handalian <handalm@amazon.com> Co-authored-by: Andrew Ross <andrross@amazon.com> Signed-off-by: shayush622 <ayush5267@gmail.com>
da51569 to
6a31779
Compare
|
Persistent review updated to latest commit 13ba03f |
|
❌ Gradle check result for 13ba03f: null Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
|
Persistent review updated to latest commit 99f2309 |
|
❌ Gradle check result for 99f2309: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
|
Closing this PR. Cache management REST API endpoints are deferred per maintainer's feedback. Key takeaways from design discussion with @cwperks :
The |
Description
This PR adds REST API endpoints for hunspell dictionary cache management. It introduces two endpoints for viewing and invalidating cached hunspell dictionaries.
Endpoints:
GET /_hunspell/cache— View all cached dictionary keys (requirescluster:monitor/hunspell/cachepermission)POST /_hunspell/cache/_invalidate— Invalidate bypackage_id,locale,cache_key, orinvalidate_allPOST /_hunspell/cache/_invalidate_all— Invalidate all cached dictionariesKey changes:
This is Part 2 of 2 — depends on #20840 (ref_path core support).
Related Issues
Resolves #20712
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.